{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 基于streamlit的大模型应用开发3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "#导入语言模型\n",
    "import os\n",
    "from langchain_community.llms import Tongyi\n",
    "from langchain_community.llms import SparkLLM\n",
    "from langchain_community.llms import QianfanLLMEndpoint\n",
    "\n",
    "import pandas as pd\n",
    "#导入模版\n",
    "from langchain.prompts import PromptTemplate\n",
    "\n",
    "from langchain_community.chat_models import ChatSparkLLM\n",
    "from langchain_community.chat_models.tongyi import ChatTongyi\n",
    "from langchain_community.chat_models import QianfanChatEndpoint\n",
    "\n",
    "#输入三个模型各自的key\n",
    "\n",
    "os.environ[\"DASHSCOPE_API_KEY\"] = \"\"\n",
    "\n",
    "os.environ[\"IFLYTEK_SPARK_APP_ID\"] = \"\"\n",
    "os.environ[\"IFLYTEK_SPARK_API_KEY\"] = \"\"\n",
    "os.environ[\"IFLYTEK_SPARK_API_SECRET\"] = \"\"\n",
    "\n",
    "os.environ[\"QIANFAN_AK\"] = \"\"\n",
    "os.environ[\"QIANFAN_SK\"] = \"\"\n",
    "from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_ty = Tongyi(temperature=0)\n",
    "model_qf = QianfanLLMEndpoint(model=\"ERNIE-Bot\")\n",
    "chat_qf = QianfanChatEndpoint(model=\"ERNIE-Bot\")\n",
    "chat_xh = ChatSparkLLM()\n",
    "chat_ty= ChatTongyi()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'0.1.16'"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import langchain\n",
    "langchain.__version__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "#! pip install --upgrade langchain -i https://mirrors.aliyun.com/pypi/simple"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## streamlit 的文件传入\n",
    "\n",
    "https://docs.streamlit.io/develop/api-reference/widgets/st.file_uploader\n",
    "\n",
    "https://github.com/langchain-ai/streamlit-agent/blob/main/streamlit_agent/chat_with_documents.py\n",
    "\n",
    "* 如何获取file_uploader的文件目录"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## streamlit的缓存\n",
    "\n",
    "https://docs.streamlit.io/develop/api-reference/caching-and-state/st.cache_resource\n",
    "\n",
    "https://zgpeace.blog.csdn.net/article/details/135665519\n",
    "\n",
    "https://zhuanlan.zhihu.com/p/614509767?eqid=8f94f74e0004c61900000004649d4c19\n",
    "\n",
    "https://docs.streamlit.io/develop/concepts/architecture/caching\n",
    "\n",
    "* 数据缓存 st.cache_data\n",
    "\n",
    "* 对象缓存 st.cache_resource\n",
    "\n",
    "\n",
    "缓存：cache\n",
    "缓存主要用来解决两个问题：\n",
    "\n",
    "长时间运行的函数重复运行，这会减慢应用程序。\n",
    "对象被重复创建，这使得它们很难在重新运行或会话中持久化。\n",
    "在老版本的Streamlit中，缓存均通过装饰器st.cache来实现。\n",
    " 在新版本中，缓存分成了两个装饰器st.cache_data和st.cache_resource\n",
    "\n",
    "缓存数据：cache_data\n",
    "cache_data适合返回DataFrames、NumPy 数组、str、int、float或者其他可序列化类型的函数。\n",
    "\n",
    "缓存资源：cache_resource\n",
    "缓存资源通常作用于缓存数据库连接和 ML 模型这类全局可用的资源。\n",
    "\n",
    "当函数的返回值不需要是可序列化的，比如数据库连接、文件句柄或线程，此时无法用cache_data，只能用cache_resource。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* classwork 1\n",
    "\n",
    "  1. 实现一个文件上传并输出\n",
    " \n",
    "  2. 实现获得上传文件的目录并通过TextLoader导入，然后实现retriever检索对象的创建\n",
    " \n",
    "  3. 定义函数，通过st.cache_resource装饰器把retriever检索对象放入缓存"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## RAG应用实现"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[INFO] [05-21 09:34:45] openapi_requestor.py:316 [t:11484]: requesting llm api endpoint: /embeddings/embedding-v1\n",
      "[INFO] [05-21 09:34:45] oauth.py:207 [t:11484]: trying to refresh access_token for ak `og6mWr***`\n",
      "[INFO] [05-21 09:34:45] oauth.py:220 [t:11484]: sucessfully refresh access_token\n"
     ]
    }
   ],
   "source": [
    "from langchain.document_loaders import TextLoader\n",
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
    "from langchain_community.embeddings import QianfanEmbeddingsEndpoint\n",
    "from langchain.vectorstores import Chroma\n",
    "from langchain.chains.combine_documents import create_stuff_documents_chain\n",
    "from langchain.chains import create_retrieval_chain\n",
    "from langchain.memory import ConversationBufferMemory, ConversationSummaryMemory\n",
    "from langchain.chains import ConversationalRetrievalChain\n",
    "\n",
    "model_qf = QianfanLLMEndpoint(temperature=0.1)\n",
    "loader_txt = TextLoader(r'G:\\云岚宗.txt', encoding='utf8')\n",
    "docs_txt = loader_txt.load()\n",
    "text_splitter_txt = RecursiveCharacterTextSplitter(chunk_size = 384, chunk_overlap = 0, separators=[\"\\n\\n\", \"\\n\", \" \", \"\", \"。\", \"，\"])\n",
    "documents_txt = text_splitter_txt.split_documents(docs_txt)\n",
    "embeddings_qf = QianfanEmbeddingsEndpoint()\n",
    "vectordb = Chroma.from_documents(documents=documents_txt, embedding=embeddings_qf)\n",
    "\n",
    "prompt = ChatPromptTemplate.from_template(\"\"\"使用下面的语料来回答本模板最末尾的问题。如果你不知道问题的答案，直接回答 \"我不知道\"，禁止随意编造答案。\n",
    "        为了保证答案尽可能简洁，你的回答必须不超过三句话，你的回答中不可以带有星号。\n",
    "        请注意！在每次回答结束之后，你都必须接上 \"感谢你的提问\" 作为结束语\n",
    "        以下是一对问题和答案的样例：\n",
    "            请问：秦始皇的原名是什么\n",
    "            秦始皇原名嬴政。感谢你的提问。\n",
    "        \n",
    "        以下是语料：\n",
    "<context>\n",
    "{context}\n",
    "</context>\n",
    "\n",
    "Question: {input}\"\"\")\n",
    "#创建检索链\n",
    "\n",
    "retriever = vectordb.as_retriever()\n",
    "memory = ConversationSummaryMemory(\n",
    "    llm=model_qf, memory_key=\"chat_history\", return_messages=True\n",
    ")\n",
    "qa = ConversationalRetrievalChain.from_llm(llm=model_qf, retriever=retriever, memory=memory)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "memory.chat_memory.messages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[WARNING] [05-21 09:34:55] base.py:495 [t:11484]: This key `stop` does not seem to be a parameter that the model `ERNIE-Bot-turbo` will accept\n",
      "[INFO] [05-21 09:34:55] openapi_requestor.py:316 [t:11484]: requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [05-21 09:34:56] openapi_requestor.py:316 [t:11484]: requesting llm api endpoint: /embeddings/embedding-v1\n",
      "[WARNING] [05-21 09:34:56] base.py:495 [t:11484]: This key `stop` does not seem to be a parameter that the model `ERNIE-Bot-turbo` will accept\n",
      "[INFO] [05-21 09:34:56] openapi_requestor.py:316 [t:11484]: requesting llm api endpoint: /chat/eb-instant\n",
      "[WARNING] [05-21 09:34:57] base.py:495 [t:11484]: This key `stop` does not seem to be a parameter that the model `ERNIE-Bot-turbo` will accept\n",
      "[INFO] [05-21 09:34:57] openapi_requestor.py:316 [t:11484]: requesting llm api endpoint: /chat/eb-instant\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "根据提供的上下文，问题\"萧炎是谁\"的目的是询问萧炎的身份或背景信息。可能的原因包括对故事中的人物感兴趣，或者需要了解更多关于这个角色的信息以便更好地理解故事情节。\n"
     ]
    }
   ],
   "source": [
    "res = qa.invoke(\n",
    "    {\"question\": \"萧炎是谁\"}\n",
    ")\n",
    "print(res[\"answer\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[WARNING] [05-21 09:35:06] base.py:495 [t:11484]: This key `stop` does not seem to be a parameter that the model `ERNIE-Bot-turbo` will accept\n",
      "[INFO] [05-21 09:35:06] openapi_requestor.py:316 [t:11484]: requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [05-21 09:35:07] openapi_requestor.py:316 [t:11484]: requesting llm api endpoint: /embeddings/embedding-v1\n",
      "[WARNING] [05-21 09:35:07] base.py:495 [t:11484]: This key `stop` does not seem to be a parameter that the model `ERNIE-Bot-turbo` will accept\n",
      "[INFO] [05-21 09:35:07] openapi_requestor.py:316 [t:11484]: requesting llm api endpoint: /chat/eb-instant\n",
      "[WARNING] [05-21 09:35:08] base.py:495 [t:11484]: This key `stop` does not seem to be a parameter that the model `ERNIE-Bot-turbo` will accept\n",
      "[INFO] [05-21 09:35:08] openapi_requestor.py:316 [t:11484]: requesting llm api endpoint: /chat/eb-instant\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "根据原文信息可知，萧炎并没有结婚。文中提到纳兰嫣然是萧炎的未婚妻，但并没有提到他们已经结婚。萧炎和纳兰嫣然的婚约是家族之间的约定，但并没有实现。因此，可以得出结论，萧炎目前并没有结婚。\n"
     ]
    }
   ],
   "source": [
    "res = qa.invoke(\n",
    "    {\"question\": \"他结婚了吗？\"}\n",
    ")\n",
    "print(res[\"answer\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "#help(QianfanEmbeddingsEndpoint)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 基于LCEL实现自动查询 SQL 数据库\n",
    "\n",
    "* sqlite数据库下载\n",
    "\n",
    "链接：https://pan.baidu.com/s/1_nvpBZcrZ_3dgYD1HKX3rg?pwd=kp0h \n",
    "提取码：kp0h"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_core.output_parsers import StrOutputParser\n",
    "from langchain_core.runnables import RunnableLambda, RunnablePassthrough"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.utilities import SQLDatabase\n",
    "\n",
    "db = SQLDatabase.from_uri(\"sqlite:///ali_langchain.db\")\n",
    "\n",
    "def get_schema(x):\n",
    "    return db.get_table_info()\n",
    "def run_query(query):\n",
    "    return db.run(query)\n",
    "def get_sql(x):\n",
    "    return x.split(\"```sql\")[1].split(\"```\")[0]\n",
    "\n",
    "template =\"\"\"根据下面的表格结构，编写一个 SQL 查询来回答用户的问题：\n",
    "{schema}\n",
    "\n",
    "问题：{question}\n",
    "给出的SQL查询应有如下形式:\n",
    "```sql\n",
    "...\n",
    "```\"\"\"\n",
    "prompt = ChatPromptTemplate.from_template(template)\n",
    "\n",
    "chain_sql=({\"schema\":RunnableLambda(get_schema),\"question\":RunnablePassthrough()}|prompt|ChatTongyi()|StrOutputParser()\n",
    "           |RunnableLambda(get_sql))\n",
    "\n",
    "template_1 =\"\"\"根据下面的表格结构、问题、SQL 查询和 SQL 响应，编写一个自然语言回应：\n",
    "{schema}\n",
    "\n",
    "问题：{question}\n",
    "SQL 查询：{query}\n",
    "SQL 响应：{response}\"\"\"\n",
    "prompt_response = ChatPromptTemplate.from_template(template_1)\n",
    "\n",
    "chain_sql2=({\"question\":RunnablePassthrough(),\"query\":chain_sql}|RunnablePassthrough.assign(schema=get_schema,response=lambda x: run_query(x[\"query\"]))|prompt_response|ChatTongyi())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'\\nCREATE TABLE alidata (\\n\\t\"index\" INTEGER, \\n\\tuser INTEGER, \\n\\tbrand INTEGER, \\n\\tbehavr INTEGER, \\n\\tdate TEXT\\n)\\n\\n/*\\n3 rows from alidata table:\\nindex\\tuser\\tbrand\\tbehavr\\tdate\\n0\\t10944750\\t13451\\t0\\t06/04\\n1\\t10944750\\t13451\\t2\\t06/04\\n2\\t10944750\\t13451\\t2\\t06/04\\n*/'"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "db.get_table_info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "AIMessage(content='在这个alidata表中，有9531个不同的品牌。', response_metadata={'model_name': 'qwen-turbo', 'finish_reason': 'stop', 'request_id': '8644bd1e-0ceb-9a94-bf66-176e57655bc0', 'token_usage': {'input_tokens': 188, 'output_tokens': 15, 'total_tokens': 203}}, id='run-e3e8cb73-b676-48ca-b4a2-93fd28804d39-0')"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain_sql2.invoke(\"表里有多少brand？\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'\\nSELECT brand, COUNT(*) AS record_count\\nFROM alidata\\nGROUP BY brand\\nORDER BY brand\\n'"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain_sql.invoke(\"表里所有brand的记录数,然后对这些记录数取平均值\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
